Skip to content

Add ResearchClawBench eval framework#2174

Merged
krampstudio merged 3 commits into
huggingface:mainfrom
black-yt:add-researchclawbench-eval-framework
May 27, 2026
Merged

Add ResearchClawBench eval framework#2174
krampstudio merged 3 commits into
huggingface:mainfrom
black-yt:add-researchclawbench-eval-framework

Conversation

@black-yt

@black-yt black-yt commented May 15, 2026

Copy link
Copy Markdown
Contributor

Summary

Adds researchclawbench to the supported evaluation frameworks for benchmark dataset eval.yaml files.

ResearchClawBench is an end-to-end scientific research benchmark for AI agents and standalone LLMs, covering workflows from reading raw data and related work to producing code, figures, and publication-style reports.

Dataset prepared for the Hub Evaluation Results feature:
https://huggingface.co/datasets/InternScience/ResearchClawBench

The dataset repo already includes:

  • eval.yaml with evaluation_framework: researchclawbench
  • .eval_results/*.yaml entries following the benchmark result format

Reference similar benchmark setup:
https://huggingface.co/datasets/claw-eval/Claw-Eval

Change

  • Add researchclawbench to EVALUATION_FRAMEWORKS in packages/tasks/src/eval.ts.

Notes

This is intended to allow the ResearchClawBench dataset to be recognized as a Benchmark dataset and display the benchmark leaderboard/tag on the Hub.


Note

Low Risk
Low risk: adds a new entry to a static EVALUATION_FRAMEWORKS registry with no changes to execution flow or data handling.

Overview
Adds ResearchClawBench to the EVALUATION_FRAMEWORKS map in packages/tasks/src/eval.ts, enabling benchmark datasets to declare evaluation_framework: researchclawbench and be recognized accordingly.

Reviewed by Cursor Bugbot for commit 36c6e24. Bugbot is set up for automated code reviews on this repo. Configure here.

@black-yt

Copy link
Copy Markdown
Contributor Author

@SBrandeis @Wauplin @gary149 @julien-c @ngxson @pcuenca

Just following up on this PR in case it was missed.

This change only adds researchclawbench to the supported evaluation frameworks so that the dataset can be recognized as a Benchmark dataset on the Hub and display the benchmark leaderboard/tag correctly.

The dataset repo and evaluation result files are already prepared:

Please let me know if there are any additional requirements or adjustments needed from my side. Thanks!

@krampstudio krampstudio left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seen with @NathanHB

@krampstudio krampstudio merged commit 7ef4d94 into huggingface:main May 27, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants